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ABSTRACT oY ot 


‘ £ Nye 
Most data retrieval ‘systems combine both 
batch and on-line features, as a function gf the 
nature of data bases and dature of queries,- 


” 


‘ 
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Two functionally different retrieval systems in 
current-use at G. D. Searle, one primarily on- 
line oriented and one batch-orlented, cay serve 
ds models for discussion. A data base $004, ' 
000 records related to pharmacology scrgeniAgs 
is used in a batch System. Since the data are 
highlystructured and comprised of, a 1h. num- 
ber of data ‘types, the searcher has nd.need t@ 
interact with he data base on a sampling basis. 
‘A batch system therefore employed). Requests 
ufted on de- 
low y.adding* * 


Another system, 
formation mu 
ic 


ements. irene with ome 
important becaus of the acti 


structural 
searcher 4 

ps among 1 
is prefer-" 


a 
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-Jjné data retrieval 


what Is vale tly, encountered is 
bination of batdh and -on- line agwherey 

ination ratio has béen determined KA 
r of factors. , g these factog@ are 

the. fol lowing: at is the nagGre of 
the /data’ ta the data’ any finit! re- 
cords are there, a nd fabat Kinds 


(/ of entries are thefe for ‘rgford? Ase- | 
cond factor relateXto the typ# of query gener- 


| 
B''? or do he queries require more complex ’ | 
loaical combinations, such as "Find all A dnd | 
Bor C and.0 but not any.F, G, ar H''? Most 
important, however, is whether - there is any. 
_ need to see a sample of hits before proceeding} 
to, the next step of the query or to another 
“query. Also, what volume of information is. 
. “Wenerated by thé query? Another factor is who | 
Searches the-system? What and how much does 
the searcher have to know about the indexed 


content of the system to search it success- BI 


S fully? A fourth factor i's what I've called 


F 


"computer ve lities", 
the size of the computer and\its associated 
. hardware and software, but also the available 
modes of data storage, and the expertise and 
“cooperation of the analysts and programmers to 
build and maintain a system tailored to reet -~ 
rétrieval needs. Finally, a point that needs 
no discussion -- hen nues does it all cost? 
‘ 
At G.‘D. Séavte, there are a.number of 
e% *fugct Fonal ly different retrieval systems ip” 
suse for searching scientific informatiope™ One 
ofthese is searched In a mode: that primar- 
- Ly batch, while another system primarily 
6n-line. A discussion of: the two systems 
will, hopefully, illustra ow batch and ‘ons 
line modes .have been 
“tific fetrieval neg 
n | 
. The - fir ystem that I'll discuss is the 
one that tsprimarily batch. {It contalns 
pharmagg?gy: ‘screening data on Searle compounds. 
, Therefare approximately 500,000 records stored 
seg fentia on tape. The unit record $s 72 
waraceérs long, divided into \12 fixed fields, 
seach’ of which is designated for entries of a 
Harticular type, such as test category, com- 
pound tested, dosg, ‘results, date reported, 
etc. -For any 72 character str ng, the data 
‘contained in the 12 fields dre |\linked to each, 
other only, and not to, any other /72 character 
string. . This means that associations can be 
made within one serif, bc not from one string 


which includes not only §% 


Bined to eet scien+ 


to another. Searching. this system is one step ._ 
’ Yy trim aparave batch mode. The searcher 


ccesses thd computer from a remote terminal 


oP ally encountered. , Aré the fies rather aa 
> straightforward a ‘sts fof selected lists of and spécAfied, in a question and answer mode, 
storéd data, sch 4s, 'Firfd all octurrences of” = the desired combination of data elements to be 
a or perk ps:YFind all occurrences pf A with *searghed. In this case, I've requested data on . 
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fay be mounted on! requ 


2 hypothetical compounds when they are active 
ina “hYpothetical test category. | have specif 
fied the fields on which the hits should be. 
sorted, a job number is assidhed, and | have 


signed off to wait for the computer operator to 


run the séarch against the data-contaiping 
tape. When the entire tape has been read, | 
will receive a sorted printout of the data. 
This essenfially batch system is appropriate 
in this case for several reasons. Because the 
system contains a small number of data types 
i. é., compound number, date,. dose, results, 
‘etc.) one can specify quite pregisely what: is 
needed and anticipate the nature off what will 
be retrieved. There is really ngfneed to see 
and evaluate a sample of the dapé before re- 
ceiving it all, since the seagfher knows be- 
forehand what to expect. since. there 
are relatively few search fequests, the tape 
+ ang turnaround time 
printing of relevant 
§ advantageous since the 
nerated is often quite* large. 
le only twelve d#fta types, and 
gefomputer questions are in a standard 
J search requests Can be defined exactly 
f requestor and the search run by anyone 
ho knows how to access the computer. Reading 
the entire data tape constitutes a significant 
element of the Search cost, but updating costs 
are reduced since additions to the data tape 
are made by adding new data in sequence to the 
end of the file. w'Ny 


is short. The of f-1j 
data in this syst 
amount of data 

Since there 


he second system I'll discuss is a mostly 

on-lime, highly interactive, multilevel ata 
retrievaP system called MUTAGENS. It contains 
published information on mutagenic, carcino- 
genic, and teratogenic themical compounds and 
agents. The information in the system has 
.come from two sources,’ The first, and larger 
Source, is the Environmental Mutagens Infor- 
mation Center at Oak Ridge National Laboratory, 
which has provided us with its tapes of indexed 
information. Each\record consists of a biblio- 
graphic reference, chemical and/or physical) 
agents studied, organtsm studied, tést object, 
and CAS Registry Number) To each record con- 
taining a chemical compound we\ have added the 
Wiswesser Line Notation fot.its chemical struc- 
ture. The second source of Yafarmation in 
this system is our ‘own indexing, of published 
pare on mutagenesis. We index\papers In the 

ee format as at EMICy but have ex- 

déd the indexing to include data‘on dosage, 
length and time of treatment with a mutagen, 
the typs and strain of organism studied\ 
mechan¥sm of action, mutation frequencies, and 
cell type as well.as the Wiswesser, Line Nota- 
tion fork the chemical structures. There ard 
presently 220,000 records, 10,000 of which are 
coded chemical structures. 
the unit Wer is of four types: biblio- 


‘ 


graphic réferences, keywords such as animals, 
organtsms,\ assay type, mechanism of action, 
etc., fixed field entries such as fjeld 


dosage and treatment schedules and strifigs of - 


ee of variable length such as WLN or. 


comfents. The Information from both sources 
is stored on dis in a ‘random access, inverted 
file format. \That is, one can get any record 
in the data Fille without having to read any 
other record. \ 

\ 9 


\ 
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-The information in’. 


This system is Segrched in a purely o on- 
line mode. After accessing the computer.from 
.,4 remote términal, rhe searcher first specifies 
the desired search level: Oictyonary, Index, 
or Data. The dictiohary level here is similar 
to the neighbor function that.,is in seweral, 
data systems. Namely, the searcher can get an 
idea of the contents of the data’ base by ques~ 
tioning what the file contains. Here, I've 
asked to see a listing of all occurrences of 
the strains of bacteria derived from Salmoré la 
typhimurium. The 35 types bre listed, as is 
the frequency of their occurrence, and an indi- 
cation of how they are used in the syStem. The 
search question is formulated at the index le~, 
vel and the number of hits caf be listed by the” 
system. Relevant hits,are flagged by’ the com- 
e rand several queries can be made and 
| in succession. The searcher can thén« 
proceed to the data level where any or abl of 
the flagged hits can be displayed. The-searcher ~ 
has the option of listing the reference number, 
only (i.e., number referring to the hard copy 
of the hit) or listing’ the entire entry. For 
example, here 4'ye asked for references -to 
“NNG when it is. used ta induce, frameshift or 
base substitution changes in Salmonella 
typhimurium. There are 4 hits In the system, 
and | have had one of them printed out fully. 
This system is well suited to our retrieval 
. eds for several reasons. Complex search 
trategies can readily be accommodated by the’ 
se of Boolean logic and the logical operators, ~~ 


id, ‘or, nd not. The searchers can proceed 
in a series of Wogical steps, sampling the 


i 


sults/of one step before proceeding to the 
“next, In, this way, it is possible to minimige 
» the number of non-relevant’ hits by either pur- 
shin a successfut strategy or abandoning a 
ine. “The inclusion of a dictionary level 
rmits a preliminary examination. of the con- 
/tdnts as well as the mechanisms of mutagenesis. | 
Silhce “tHe system actually reads a small number 
/ ofjentries in searching for relevant hits, the 
oft per search is low. On the other hand, 
updating the random access system:is more cost~ °* 
ly 'thah the sequentially stored system, since 
each new record must be split up and the seg- 
ments inserted appropriate! In our case, 
“the efficiencf® accuraty ae low search cost 
offsets the cost of update. 


‘ 


\MUTAGENS is part-of larger retrieval sys- 


hich is still under development... Ulti- 
mately the search nodes in MUTAGENS will be 
imp! ted so that dosage or treatment can be 
specifically associated withsa compound. In 
addftiion, the variable length strings (WLN, 
= eset and the fixed fjeld entries’ (dose 
schedule, treatment schedule) will be string- 
‘searchable. Ultimately, also, .the same type of ™ 
system will be used to handle the primarily, ~ 
tch system I discussed earlier, since It is 
becoming more desirable to assotiate that. data 
nN more complex combinations. ” 
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A. In guimmary, | “thirik*one can conclude that 
the| choice and/or combination of batch 6r -on- 
Line modes of data retrieval depends on the 
needs and resources of the user. 1 hope to: 

' haye demonstrated this in the last few mimutes 
‘ang witl be happy to discuss these systems 


Ae in the discussion period. 
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+» “Toping with information. rptrigval demands _ 


_ vhas IBd librarians; Information’ and discipline 


specialists to.greater use 6f both batch.and 
on-line services. A halance of the two ser- 
vices, pius some old-fashioned tools, is needed 
to match demands to available resources. Over” 
5,000-cifations are catalogued and indexed. *% 
daily, and most of these eventually become-ac? 
cessittie by computer. 
material d increasing oumber of users, many / 
searches aré repeated. Consideration should be/ 
given to publishing searches for which there is 
a high market demand, perhaps through regional | 
services that publicize the intent to make a | 
given search. | 
: | 
. Coping with information retrieval als 
has led Ibtarians, information and discipline 
specialists #o greater use of batch and on- “Vine 
services. ‘A balance of the two services, c 
pled with gome old-fashioned tools, may be 
needed if the demands of today are to match the 
resources available. Can users efficiently , 
merge multiple bibliographic data bases without 
assistance from supp]}iers: What should the 
data base output formebeT, Can.ysers' needs be 
aggregated to make the output.a mor®’ efficient 
TUNEEI of the computer? ’ 
. | would like to briefly exptore these 
‘thoughts as we prepare to open’ for fuM panel 
discussion. a s 
+ Just considering the membership of the 
National Federation of Abstracting and Indexing 
Services, over 5,000 citations are. cataloged | 
and | indexed daily according to the Services 
1975 bership report. 1 don't know:what the’ 
. daily fijure for, the universe Is,,but 5,000 per 
day is a staggering figure, and most of those 
citations end up in someone' 3 machine accessi- 
, ble bibliographic data base Sooner or later. 
‘ Over'40 bibliographic data bases were'cited in: 


With this abundance of | 


1 


- prehensive coverage for thé searcher. 


; the ce ‘of Scientific Information Ois- ~< 7 
“.@ ; wi Le - ae 


‘. dane 


semination Center's 1972°''Survey of Informa- 
tlon Center Services’ as publicly available 
ahd used for retrospective searching. Thirty- 
geven were cited as used for selective dissemi- 
nation of: faformation purposes. I'm not sure 
all the=system innovations in on-line and batch 


. / retrieval techniques we can qhink of will help 


us ftnd the resources needed to access the mil-— 


2 ons. and billions of bits of information 
stored in computers today. 


‘| would like to quote from Bill Knox' 
,|'Pathology- of Information''.! 
"Our laws and practices assume that there 
is a scarcity of information and that a * 
good society results from a maximum flow 
of information of all kinds. But we have 
moved, in a generation, tato an era when 
the average citizen suffecs, not from a 
scarcity, but from an over-abundance of 
information." 


Is it possible to harness this over-abun- 
7 For example, how many searches are 

being asked of bibliggraphic data bases every 
day? . Whether batch or on-line techniques are 


‘used, It seems to make no difference for in a 
recent progress report on a Study being con- 
ducted for NSF, results were reported showing 
on-line’ searching equally cost effective. as 
batch searching. Another estimate is that 1.5 
million terminals will be operating in. the 
United States in 1980. So, shouldn't we start 
asking ourselves some questions? How many. 
times are the Same searches repeated in the | 
thousands of terminals now linked to biblio- 
graphic data bases? You may say, so what? 
Redundartey ‘In this. business is a way of life, 
‘and we must ‘accept the fact that our. data bases 
overlap, ‘and our questions may duplicate one 
another. But at what cost? When labor costs, 
computer costs, terminals, paper, postage, and 
a else is considered, what price can* 
users really afford to pay for a bibliographic® 
Search?, And think of the added cost when the 
same question asked of one data base must be 
placed against other data bases to insure com- 
tn your 


| 
The dthotogy of Information", William T.¥nox 
Pu im Book, Prod. prey June 1971. 


re 


‘ > d / 2 _ + ra . e sd 
\ own experience, how many data-bases must be ° Now, in conclusion, I'm sure many of you have , 
a searched to give you reasonable satisfaction-- ' thought about this problem and probably have 
\ Three? Five? Ten? To formulate and retrieve better ideas along this line. Most important 
: the answer to a search, charges of $50 - $200 is that we continue-to think and work together 
\ were common back in:the 1972 ASIOIC Survey. sdlving retrieval needs, so that we do not 
Indeed, several ranged above: the $1,000 mark. become overwhelmed by the.mass of information 


P that is now accessible by machine. 
Is it possible to turn this question .- : 
around and. ask -- Is there a better way to ‘ ° 
take advantage of the Batch.and On-line ‘tech- 
nology available today? . < 5 
: Can we as information scténtists think.of ’ 
‘ ways’ to anticipate our users' needs? ‘ 


If we can_better anticipate and provide 
market aggrégation of select searches, what ‘ : 4 
woyld happen co unit-eosts per individual e . 
satisfied? e 


Does it make sense to think of a balance ¢ 4 
between published bibliographies, where market 
aggregation is possible, and on-line or batch’ ; 
services, where market aggregation is not pos- ‘ i a 
sible, that will help us to cor serve scarce ‘ 
,resources? — 


wy . 
. * . 


MURERIC DATA- BASES ‘ : 2 : - 
’ ° pr = . 
| “Let's extend the problem from bibliographic \ : ‘ . 
to statistical or numeric data bases stored in ae 7 ~. ; 
computer. Let's use as an example, a request : 
to pull froma statistical data base, an anal- 2 
‘@ ywis of geoar angge areas In the United States ‘ / 
where the charactéristics of the inhabitants ‘ 
‘ \.make the areas candidates’ for a proposed con- 
sage product. To develop the strategy, pull 

data off the fide and-amortize the cost of « 
loatling and maintaining the data may cost as : ‘ 
a $1,200.00. Since retrieval of infor- ae 
-emation\from statistical data bases often takes ne a Pd 
-line, how about taking a different 2 . . 
ket aggregation possibilities. If y 
ten people needed the same information con~ / : i 
tained in ouk example, look what happens to ¢ ) ; / 
the cost. s reduced from $1,200.00, for : er 
-00 per user. / ; 

. , \ : ‘ 
roblems. How could sych 4, t , * 
+, -Market aggregation take place?» 5 i : ug G A 
: \ F ‘ 
b * Perhaps through a\ egional service,, notice 
' of, intent to retrieve asta in response to a * er, 
specific need could be netics in a newsletter oe Ot = 


or other mailing. The notice of intent would /, ; : 
represent for those having\ comparable or simi- ; 
~ : lar data needs, an invitation to join in the t ; : 
‘ data retrieval costs for the search (which 

‘ would be estimated). Or, perhaps, a more time- 

ly method would link all ‘patrons of a certain 

statistical data base through a conferencing 
4 network. This would quickly enable ‘the patron y * 

to vote "Yes" or ''No'' on sharing the: front-end 

’ cost of a proposed i a i 7 ai i 


A suspense date of cutoff foe RSVPing would a) 
be cited, and the data: would be developed with ; 
exe "an ad hgc consort ium. If no one responded, f 
- * ‘the Invtiator'would pay the full cost (or not . . , 
ca, * proceed with the interrogation). If there were a ee 
five respondents, the cost would be shared in : 
fifths, 0 respdénderits in 10ths, and so on. 6 


4 i . 
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A “COMPARISON* OF COSTS BETWEEN ON-LINE 
AND FAST-BATCH SEARCHING 


Daniel U. Wilde 


; ABSTRACT |: 4 
The presence of a dedicated computer 
solely for information retrieval at NERAC | 
creates a unique situation with regard to turn- 
around time. Therefore, NERAC operates a fast- 
batch system. Delivery of results to reques~ 
tors is generally faster than that of materi 
searched on-line and printed off-line as is 
“the normal case for most current Hong line! 
searches. Costs for the number of searches 
done at NERAC in a year, if done at the aver- .) 
age cost of current on-line searches, would 

be close to $600,000.00. This compares to a 
total cost for fast-batch of $190,000, which 
includes all equipment, file, personnel, and 
overhead expenses. : 


oe BACKGROUND 


The New England Research Application Cen- 
ter (NERAC) was established at the University 
of Connecticut in 1966 as part of NASA's 


Technology Utilization Program. The ,purpose 
of the Center is to help business andsindustry 
gain_atcess to appropriate technical ‘and 
business Information. NERAC ‘agtempts to take 
‘technology that was invented atone location 
and trigs to put it to work at another. In: 
other words, NERAC tries to help business and 
industry benefit by using someone else's tech- 
nology. 


“It is well accepted that selnslics trans- 
fer is best accomplished by human being to 
human being communication. Consequently, NERAC 
uses techr,ical specialists as interfaces be- - 
tween its users and its data sources. All of 
NERAC's technical information specialists have 
graduate degrees and years of industrial ex- 
perience in their various subject fields. 

Here, users can themselves be information 
specialists. at participating industrial firms 
or at other technical information centers. 
They can be company engineers or corporate 
officers who are thé actual end users of 
NERAC'S information. 


Perhaps the singular difference between 
NERAC and other Information Analysis Centers 
is that NERAC has its own computer dedicated 
to serving just the participants of its 


% Technology Utilization program. 


Presently, 
NERAC is searching 18 different abstract 

files on this inhouse machine. These include 
all or files, such as Chemical Abstracts 


* CONDENSATES, Engineering Index, Iinspec, SPIN, 


etc. and the complete NASA STAR and IAA files. 


By having Its own machine dedicated to 
just its participants, NERAC's technical ‘ine. 
formation specialists need not worry about 
turnaround time.' NERAC's computer staff 
operates its machjne three shifts a day, seven 
days a week and can search all files in a 24- 
hour period, if and when the need arises. 
NERAC's techn'¥eal_ information specialists do 
not complain about“ 
quently, NERAC's mode ofaperation might be 
called a ''fast batch" syst 


COST CONSIDERATIONS 


It is very difficult to compare on 
versus batch because no center has yet run 
both types of operations in parallel long 
enough to.generate complete and valid data. 
However,’ the staff of NC/STRC has done perhaps 
the "best" job and has actually run comparisons 
using the same people on‘ the same problems using 
parallel systems.” Nevertheless, it is possi=- 
ble to make some compard Zonstiof NERAC's fast 
batch system with various ine systems. 


In a typical month during. mid-1975, NERAC's 
computer processed an averace of 1,000 retro- 
spective file searches and 4,000 current aware- 
ness file searches. Here, a retrospective file 
search is defined as a search of one complete ' 
file run against one update issue. Thus, an 


‘Engineering Index profile would be run once a - 


month while a Chemical Abstracts CONDENSATES 
profile would be run four or Five times per 
month. 


For online expenses, it is necessary to 
make some assumptions because ‘the literature 
only ‘indicates costs for very quick lookup 
searches without exhaustive file examinaNjons. 
Consequently, online costs are understated 
Nevertheless, an estimated price for an onl 
retrospective search including citation and 
abstract printing costs can be expected to 
exceed $30.00 for a complete and thorough 
examination of a typical file.? Likewise, a 
cost of $5.00 per current awareness is reason- 
able. Using the above search loads and prices, 
it would cost NERAC close to $600,000 per year 
to perform its’ searching via an outside online 
service. e 
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Thus, NERAC's fast-batch oper, tlon gifts ‘ ‘ 7 
approximately one third an eXternal online = : ae he 


system. Furthermore, NERAC's System has: 


i recently been upgraded and still has unused ~ k % 
search capacity. 
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NERAC is organized to help “tes partici- 

pants. When users telephone, our technical 

- ‘ staff is available to answer their questions 

: immediately. Callers do not have to wait is 
\ because NERAC's staff are at’ terminals with 

\ interactive searches in progress. Partici- 

, pants’ are never put on hold or tdld‘to call 

back. NERAC tries to never keep users waiting. ‘ ’ . 


F As a result, NERAC time-shares its ~ ‘ : U aN 

a technical information specialists. They wait : 

for user calJs while they are reviewing and ~ ee : ‘“ 

evaluating searches that were run’ the previous F » 
-night and while they designing strategies : 

for the next night. When a user calls, the 

appropraate technical specialist is ready at 

his or her desk available to dialogue the ‘ 

request. Once the user finished the call, . 

the user is free to go on about Mis more im- : 

portant work while NERAC's expegienced tech- op A \ - 
} nical specialists do the search work. ray ’ 


Finally, NERAC maximizes the use of its ‘ " " ‘ 
computer. The computer is a tool that should. . * i 
be used whenever possible to help participants. ’ 
; tn contrast, if NERAC was buyihg service on- 
* limé, its costs would go up as more searches - iy - 
‘were run. Increasing costs would inhibit 

greater usage. 


Pe WERAC does everything possible to en- 
“TF courage’ its’ dsers tagsubmit many questions. 
xv Because\ \NERAC's search costs are fixed, it . 
Ke can pass\ the savings onto its participants. j ‘ : 2” ie 
This makes: it easier for them to ask more E ‘ 
% questions.\ And, when a user asks a lot gf 
x questions, ‘they are sure to benefit from : 
NERAC's: service. * f ‘ ar) . 
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ASIS PANEL - "SYSTEM sealiNre FOR ON-LINE % actual search itself, With the use? ‘able to 


VS. BATCH" display or print search output at his terminal. 
faker: a. phenery ; igure 2) ''Batch'' 's perceived by many as ° 
L : the return to yesterday's computer systems, 
ra . with\data’ and commands being made machine- 
readable by keypunching, the cards transported 
Ma ABSTRACT . to a computer center where they are grouped gris 


he “ong) ine" Paboviriy is digacts of. against a ‘set of files at one time. Thus, 
s 'InStantaneous''’and "'batch'' is considered "batch"! means, to most people, long delays “ 
z* Syjanynous with "delayed", the system at NC/ . while questi 
STRC'does not fit these concepts. wn -line of processing. 

systems used with off-line printing {the. 

typical situation) often engender delays (Figure 3) Let 
‘ measured in days. And batch searching of a 
, computer in the next room can provide very 

fast, If not instantaneous, results. 


point out that, while an 
on-line sear@h usyally provides some assurance 
that the right quedtion ha& been asked, and y 
that answers will be relevant, the complete 
‘ : set of answers is*ordinarily delayed for days 
A comparison of the same questions by printing them at th& computer center, and 
against the same data bases was made by NC/ wailing them to the user's location. 
STRC staff. \ Quality of results was found to 
be a function.of staff member capability and 
time spent formulating the query. For a 
~ given searcher, quality was independent of 
system mode. Search costs are higher for an 


In a “few cases, wher&a "quick fix" is 
sought, and any relevant information will 
solve the problem, the delay\for mailing the 
complete output is of no cons 


on-line system, but, the disparity decreases (Figure 4) What about systems that fall be- 
as searches get larger. For a small search, tween the extremes of batch and oh-line as 
on-line cost is 1.6 times as much‘as batch, | have defined them? One such system is in 
* while for a large search, it was only 1.2. operation at our center in North Carolina. My 
times as much. . nv system diagram looks very much like the on- 
‘ : ' > se oe line system | showed first; in fact, the _ 
Like NERAC, NC/STRC is a NASA-supported, diagram is identical. What are the differences 
fe industry-oriented information center. Like between systems? “The first important fer- 
Searle, we use both batch and on-line search ence is in the amount of Ipteractivedialogue : 
. systems. between the system and the user Instead of 
>, p referring to computer-stored-indexes and 
| declan ames ty cements for tls farels thesauri splayed at he terminal to construct * 
to speak first or last, it would be useful ‘ a search strategy fap ah Yet Uses. MLETOLOEe 


indexes dnd printed’ thesauti at his desk. 
Instead of browsing intermediate results dis~ 
played on a terminal, our people, browse their 
_—fesults in the printed Journals. The final 


to give you my definitions of the terms "on= 
line" and "'batch''. These are polarized words 
which evoke unstated assumpt fons when we hear 


. them. : \ a search. strategy.is entered interactively via 4 
foe (Figure 1) Tip, "On-Line'' means a system ™~ terminal, with the computer response limited 
, in which the uber's terminal is tofinected by * to checking for YValidity and completeness in 
x a communications link to a (usually distant) the atretegy: ; - 


computer; a system which’ accepts data and \ 


r 
Sommands from the user and responds to the The second important difference between, 


our system and the typical batch system is in 


user "Immediately", as if the user were 
' - our use of search files mounted on direct 
f eng woed in a conversakion with the system, access storage devices. This makes it * * ‘ 
' : 4% The nature of the response can vary economic to run single searches on demand, ~ 
A widely form simple acknowledgement that Since the specific parts of the file needed * 
= a job hds been received for later \Processing,. “can be examined withdyt searching the entire 
to the eramination of indexes and “files in file. 


: Bo to develop ‘a search strategy, to the 
ie 
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” difference was 


(Figure 5) While our system cah provide 
answers in Minutes, the computer center we 
use has an incentive pricing system which 
‘places a premium on quick responses. (Our 
‘normal time for search output is overnight, 
but we can get results in an hour or two if , 
we need them. And our results are complete 
when received. 


-*Dan Wilde has referked to a study we made 
about a year aod a half ago; some of you may 
have seen reports on it, by Fred Smetana of 
NCSU. What we did was this: we used our 
staff of information specialists and a few 
less-skilled people (like me) to search the 
NASA data base in two ways. The first method 
was °to use our batch search system in the way 
U have just described. The second way was to 
make an on‘line search.of the Identical NASA 
data base using a terminal connected to the 
NASA computer then in College Park, MD. We 
arranged for each person to search’ the same 
question using both systems. To minimize the 
effects of learn from the first search’on 
the quality of th® second sedrch, we alter’ 
nated the,system which was used first, and we 
withheld the full search dutput until both 
searches had been made. 


What were the results? We expected that 
the ana)yst at his desk would spend less time 
in developing his strategy than the’ person at 
the terminal, because of system delays in 
responding, poor typing skills, and so on. 
This did not occur. The total time spent 


*varied widely, from person to person, but 


most people spent about the same time per 
question at the desk or at the terminal. 


Our evaluat{on of search quality showed 
that, regardless of system used, the more 
time spent on the strategy, the better the 


search results. In general, both searches—_ 
by an individual were equal in quality. 


As for search costs, we applied commer~ 
cial. rates to our use of the NASA Recon on- 


Vine system and found that our on-line searches , 


cost from two to three times as much as’ the 
batch sedrches. | have said that labor was 
about the same for both systems; the big 

in computer costs. Input data 
was keypunched for these batch searehes; the 
on-line system was then on a 360/50 machine 
with,relatively slow response due to high | 
user loading. ‘ 


Within, the’ last month we made test 
searches of the ERIC data base using Lock- 
heed's DIALOG system‘and our system with — 
Interactive terminal entry of strategy. 

+ (Figures 6 § 7) Here the cost difference is 
less. Yur terminal entry system was designed 
for student use, and is relatively stow be- 
cause of the tutorial features included. The 
Lockheed system has be optimized for quick 
response, and our analyst is-skilled in its 
use (Figure 8). The cost basis for this 
comparison is shown here. 


In conclusiom, what are the trends, and 
what , improvements are needed to make mechan- 
ized’ searching more effective at lower cost? 
It is clear that the return of search output 
by mail from a-distant computer center is ° 


the principal source of delay for on-line j 0 


* 


8 


_ largest part, 


searches now. The next upgrading of the 
computer/networks, such as Tymshare's Tymnet, 
through ‘which many ‘users communicate -with on- 
line‘ services, might well include the location 
of high-speed printers at most of the network 
nodes;‘this would be similar to the Mailgram 
service now offered by Western Union and the 
Postal Service. : < 


A belief which | hold is that much of the 
demand for’ searches comes fron smal] and F, 
medium-sized organizations which cannot support 
or keep busy a skilled searcher. After the 
initial glamour of "Il can do it myself-ag my. 
own terminal'' passes, many of/ these searthes 


«will be given to intermediaries at local and* 


regional. search centers. The computer systems 
used by. these centers will’ be designed to meet 
economic constraints which are now, for; the- 
ignored by users who are fasci- 
nated with the ability to talk to the computer. 


Note: The cost of 48¢ per minute for 
communications shown in Figure 8 is for a 
direct-dialed call from Research Triangle Park 
to Palo Alto. Now that a Tymsat has been 
located in the Park, that cost drops to about 
17¢ per minute. 
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